Effects of COVID-19 pandemic on hospital-aquired infections

BMIN503/EPID600 Final Project

Author

Kevin Mears


Use this template to complete your project throughout the course. Your Final Project presentation will be based on the contents of this document. Replace the title/name above and text below with your own, but keep the headers. Feel free to change the theme and other display settings, although this is not required.

1 Overview

Give a brief a description of your project and its goal(s), what data you are using to complete it, and what two faculty/staff in different fields you have spoken to about your project with a brief summary of what you learned from each person. Include a link to your final project GitHub repository.

The goal of my final project is to determine how the COVID-19 pandemic and the surge in hospital visits, changes in cleaning/hygiene, and people staying at home affected the incidence of hospital-associated infections. I identified the 2020 data set from the CDC National Hospital Care Survey (NHCS) which contains 132,694 inpatients and 388,753 emergency department visits. It contains information including patient age, sex, the month they were discharged, and their diagnoses. I will use this data to determine how the incidence of hospital-associated infections changed month-to-month in 2020 and whether there are correlations with other parameters (e.g. COVID-19 diagnosis).

I spoke with Dr. Joseph Zackular (Assistant Professor of Pathology and Laboratory Medicine at CHOP) who is a bacteriologist and an expert on C. difficile, the most common hospital-acquired infection. He suggested I stratify the dataset based on sex and age as demographics that might affect hospital infections. He suggested I look at bacterial pneumonia as a positive control as it is commonly associated with COVID co-infection.

I also spoke with Dr. Kyle Bittinger (Bioinformatics Laboratory Director, CHOP Microbiome Center) who generously offered to help with the code where needed.

Final Project Github Repo

2 Introduction

Describe the problem addressed, its significance, and some background to motivate the problem. This should extend what is in the Section 1.

Explain why your problem is interdisciplinary, what fields can contribute to its understanding, and incorporate background related to what you learned from meeting with faculty/staff.

Hospital-acquired infections (HAI) are infections acquired after hospital admission. It includes catheter-associated urinary tract infections, central line-associated blood stream infections, surgical site infections, ventilator-associated and hospital-acquired pneumonia, and Clostridioides difficile infections. The risk of HAIs depend on many factors, including the facility’s disinfection and infection prevention practices, the patient’s immune status, length of stay in the hospital, co-morbitities, ventilator support, and use of invasive procedures. Receipt of antibiotics is one of the major risk factors for developing Clostridioides difficile infection and other multidrug resistant bacterial infections (e.g. Vancomycin resistant enteroccus or Methicillin resistant Staphylococcus aureus). According to the CDC, about 4% of hospitalized patients experience at least one HAI, with an estimated 648,000 cases in 2011; these are dominated by pneumonia, surgical site infections, gastrointestinal infections, UTIs, and bloodstream infections. Clostidioides difficile infection is the most common cause of hospital acquired infections and can cause severe, potentially life-threatening colitis. Common causes of hospital-associated and ventilator-associated pneumonia are Staphylococcus aureus, Pseudomonas aeruginosa, E. coli, and Klebsiella penumoniae. Common causes of catheter-associated UTIs are Enterococcus, Staphylcoccus aureus, Pseudomonas, Proteus, Klebsiella, and Candida. Common causes of surgical site infections include Staphylococcus aureus, other Staphylococcus, Enterococcus, E. coli, Pseudomonas aeruginosa, Enterobacter, and Klebsiella.

The COVID-19 pandemic led to a surge in hospital visits, overwhelming our healthcare system and led to over a million deaths. Coincidently, disinfection practices intensified in public facilities including hospitals. Many causes of hospital-associated infections, including C. difficile, MRSA, VRE, norovirus are difficult to disinfect using traditional cleaning methods. The present study aims to determine how the COVID-19 pandemic affected the incidence of hospital-associated infections in 2020.

This is an interdisciplinary question because it requires an understanding of microbiology and the various factors that affect pathogenesis, epidemiology to appreciate the how disease is spread in a hospital setting, and data science for bioinformatics analysis of hospital survey data.

Sources:

  • Monegro et al. Hospital-Acquired Infections. (2023). In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing. https://www.ncbi.nlm.nih.gov/books/NBK441857/

  • Dewey et al. Increased Use of Disinfectants During the COVID-19 Pandemic and Its Potential Impacts on Health and Safety. (2021) Acs Chem Health Safety.

  • Dancer, S.J. Controlling Hospital-Acquired Infection: Focus on the Role of the Environment and New Technologies for Decontamination. (2014) Clin Microbiol Rev.

3 Methods

Describe the data used and general methodological approach used to address the problem described in the Section 2. Subsequently, incorporate full R code necessary to retrieve and clean data, and perform analysis. Be sure to include a description of code so that others (including your future self) can understand what you are doing and why.

library("tidyverse")
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library("ggplot2")
library("cowplot")

Attaching package: 'cowplot'

The following object is masked from 'package:lubridate':

    stamp
# Read in NHCS 2020 inpatient Public Use File R Dataset
#https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NHCS/2020/R/nhcs2020ip_r.rds
url <- "https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NHCS/2020/R/nhcs2020ip_r.rds"
nhcs2020ip <- read_rds(url)
# inpatient data set
#pull out variables list
var_2020 <- nhcs2020ip$variables

#select useful columns
var_2020_select <- select(var_2020, 1:71)

#variable names 
varnames_2020 <- colnames(var_2020_select)

#pull out primary diagnosis (DX1)
diag1_2020 <- var_2020_select$DX1

#determine the number of primary C. diff infections (ICD-10 diagnosis code: A047)
primaryCdiff_2020 <- sum(diag1_2020 == "A047", na.rm = TRUE)
primaryCdiff_2020
[1] 245
#there were 245 cases of primary C.diff infection
#combine all diagnosis
alldiag_2020 <- var_2020_select |>
  pivot_longer(
    cols = c(DX1:DX30), 
    values_to = "DX"
  )

#all COVID infections
all_covid <- sum(alldiag_2020 =="U071", na.rm = TRUE)
all_covid
[1] 6452
#6452 total COVID-19 diagnosis 

#all C.diff infection
allCdiff <- sum(alldiag_2020 == "A047", na.rm = TRUE)
allCdiff
[1] 992
#992 total C.diff infection
#subset COVID-19 patients
covid_patients <- filter(alldiag_2020, DX =="U071")
covid_patients <- covid_patients |>
  filter(!is.na(AGE))
#re-level age
covid_patients <- covid_patients |>
  mutate(age.simplified = cut(AGE, 
                              breaks = c(0, 5, 10, 15, 20, seq(30, 100, by = 10)), 
                              right = FALSE, 
                              labels = c("less than 5", "5-9", "10-14", "15-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80-89", "90+")))

count(covid_patients, age.simplified)
# A tibble: 11 × 2
   age.simplified     n
   <fct>          <int>
 1 less than 5       45
 2 5-9               17
 3 10-14             23
 4 15-19             61
 5 20-29            334
 6 30-39            509
 7 40-49            648
 8 50-59           1071
 9 60-69           1392
10 70-79           1219
11 80-89           1130
count(covid_patients, AGE)
# A tibble: 86 × 2
     AGE     n
   <dbl> <int>
 1     0    25
 2     1     7
 3     2     7
 4     3     3
 5     4     3
 6     5     3
 7     7     2
 8     8     9
 9     9     3
10    10     2
# ℹ 76 more rows
count(covid_patients, SEX)
# A tibble: 2 × 2
  SEX        n
  <fct>  <int>
1 Male    3369
2 Female  3080
count(covid_patients, LOS) #length of stay
# A tibble: 17 × 2
     LOS     n
   <dbl> <int>
 1     0    50
 2     1   459
 3     2   717
 4     3   708
 5     4   634
 6     5   608
 7     6   491
 8     7   376
 9     8   298
10     9   243
11    10   235
12    11   198
13    12   158
14    13   130
15    14   116
16    15  1027
17    NA     1
count(covid_patients, LOS_30DAYS) #greater than 30 days
# A tibble: 3 × 2
  LOS_30DAYS     n
  <lgl>      <int>
1 FALSE       6160
2 TRUE         288
3 NA             1
count(covid_patients, DISCHARGE_MONTH)
# A tibble: 11 × 2
   DISCHARGE_MONTH     n
             <dbl> <int>
 1               1     1
 2               3     5
 3               4  1242
 4               5   806
 5               6   419
 6               7   443
 7               8   362
 8               9   251
 9              10   453
10              11   913
11              12  1554
count(covid_patients, DISCHARGE_STATUS)
# A tibble: 9 × 2
  DISCHARGE_STATUS                            n
  <fct>                                   <int>
1 Routine to home                          3178
2 Left against medical advice                74
3 Transfer to short term facility            83
4 Transfer to long term facility            249
5 Home health care                          854
6 Hospice care - home or medical facility   178
7 Other                                     880
8 Dead                                      875
9 <NA>                                       78
#visualization
ggplot(covid_patients, aes(x = SEX)) +
  geom_bar() +
  labs(x = "Sex", y = "Count") +
  ggtitle("COVID-19 cases by sex") +
  theme_bw()

ggplot(covid_patients, aes(x = age.simplified)) +
  geom_bar() +
  labs(x = "Age", y = "Count") +
  ggtitle("COVID-19 cases by age") +
  theme_bw()

ggplot(covid_patients, aes(x = LOS)) +
  geom_bar() +
  labs(x = "Length of stay (up to 14 days)", y = "Count") +
  ggtitle("COVID-19 cases by length of stay") +
  theme_bw()
Warning: Removed 1 row containing non-finite outside the scale range
(`stat_count()`).

#greater than 15 binned in data

ggplot(covid_patients, aes(x = LOS_30DAYS)) +
  geom_bar() +
  labs(x = "Length of stay greater than 30 days", y = "Count") +
  ggtitle("COVID-19 cases by length of stay greater than 30 days") +
  theme_bw()

ggplot(covid_patients, aes(x = DISCHARGE_MONTH)) +
  geom_bar() +
  scale_x_continuous(breaks = 1:12) +
  labs(x = "Month", y = "Count") +
  ggtitle("COVID-19 cases by discharge month") +
  theme_bw()

#subset bacterial pneumonia patients
bactpneumo_patients <- filter(alldiag_2020, DX %in% c("J13", "J14", "J150", "J151", "J1520", "J15211", "J1529", "J153", "J154", "J155", "J1561", "J1569", "J157", "J158", "J159", "J160"))
#J13     Pneumonia due to Streptococcus pneumoniae
#J14     Pneumonia due to Hemophilus influenzae
#J150    Pneumonia due to Klebsiella pneumoniae
#J151    Pneumonia due to Pseudomonas
#J1520   Pneumonia due to staphylococcus, unspecified
#J15211  Pneumonia due to Methicillin susceptible Staphylococcus aureus
#J15212  Pneumonia due to Methicillin resistant Staphylococcus aureus
#J1529   Pneumonia due to other staphylococcus
#J153    Pneumonia due to streptococcus, group B
#J154    Pneumonia due to other streptococci
#J155    Pneumonia due to Escherichia coli
#J1561   Pneumonia due to Acinetobacter baumannii
#J1569   Pneumonia due to other Gram-negative bacteria
#J157    Pneumonia due to Mycoplasma pneumoniae
#J158    Pneumonia due to other specified bacteria
#J159    Unspecified bacterial pneumonia
#J160    Chlamydial pneumonia
#re-level age
bactpneumo_patients <- bactpneumo_patients |>
  mutate(age.simplified = cut(AGE, 
                              breaks = c(0, 5, 10, 15, 20, seq(30, 100, by = 10)), 
                              right = FALSE, 
                              labels = c("less than 5", "5-9", "10-14", "15-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80-89", "90+")))

count(bactpneumo_patients, age.simplified)
# A tibble: 11 × 2
   age.simplified     n
   <fct>          <int>
 1 less than 5       72
 2 5-9               18
 3 10-14             12
 4 15-19             21
 5 20-29             47
 6 30-39             91
 7 40-49            118
 8 50-59            224
 9 60-69            427
10 70-79            375
11 80-89            267
count(bactpneumo_patients, SEX)
# A tibble: 2 × 2
  SEX        n
  <fct>  <int>
1 Male     971
2 Female   701
count(bactpneumo_patients, LOS) #length of stay
# A tibble: 16 × 2
     LOS     n
   <dbl> <int>
 1     0     6
 2     1    41
 3     2    95
 4     3   124
 5     4   138
 6     5   116
 7     6    93
 8     7    78
 9     8    81
10     9    70
11    10    63
12    11    62
13    12    51
14    13    39
15    14    37
16    15   578
count(bactpneumo_patients, LOS_30DAYS) #greater than 30 days
# A tibble: 2 × 2
  LOS_30DAYS     n
  <lgl>      <int>
1 FALSE       1438
2 TRUE         234
count(bactpneumo_patients, DISCHARGE_MONTH)
# A tibble: 12 × 2
   DISCHARGE_MONTH     n
             <dbl> <int>
 1               1   186
 2               2   129
 3               3   187
 4               4   171
 5               5   162
 6               6   109
 7               7   102
 8               8   111
 9               9    82
10              10   119
11              11   138
12              12   176
count(bactpneumo_patients, DISCHARGE_STATUS)
# A tibble: 9 × 2
  DISCHARGE_STATUS                            n
  <fct>                                   <int>
1 Routine to home                           567
2 Left against medical advice                11
3 Transfer to short term facility            21
4 Transfer to long term facility            175
5 Home health care                          263
6 Hospice care - home or medical facility    69
7 Other                                     254
8 Dead                                      306
9 <NA>                                        6
#visualization
ggplot(bactpneumo_patients, aes(x = SEX)) +
  geom_bar() +
  labs(x = "Sex", y = "Count") +
  ggtitle("Bacterial penumonia cases by sex") +
  theme_bw()

ggplot(bactpneumo_patients, aes(x = age.simplified)) +
  geom_bar() +
  labs(x = "Age", y = "Count") +
    ggtitle("Bacterial penumonia cases by age") +
  theme_bw()

ggplot(bactpneumo_patients, aes(x = LOS)) +
  geom_bar() +
  labs(x = "Length of stay (up to 14 days)", y = "Count") +
    ggtitle("Bacterial penumonia cases by length of stay") +
  theme_bw()

#greater than 15 binned in data

ggplot(bactpneumo_patients, aes(x = LOS_30DAYS)) +
  geom_bar() +
  labs(x = "Length of stay greater than 30 days", y = "Count") +
  ggtitle("Bacterial penumonia cases by length of stay greater than 30 days") +
  theme_bw()

ggplot(bactpneumo_patients, aes(x = DISCHARGE_MONTH)) +
  geom_bar() +
  scale_x_continuous(breaks = 1:12) +
  labs(x = "Month", y = "Count") +
    ggtitle("Bacterial penumonia cases by discharge month") +
  theme_bw()

#subset C.diff patients
Cdiff_patients <- filter(alldiag_2020, DX == "A047")
#plot number of cases by demographics
#re-level age
Cdiff_patients <- Cdiff_patients |>
  mutate(age.simplified = cut(AGE, 
                              breaks = c(0, 5, 10, 15, 20, seq(30, 100, by = 10)), 
                              right = FALSE, 
                              labels = c("less than 5", "5-9", "10-14", "15-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80-89", "90+")))

count(Cdiff_patients, age.simplified)
# A tibble: 11 × 2
   age.simplified     n
   <fct>          <int>
 1 less than 5        7
 2 5-9                7
 3 10-14              8
 4 15-19             15
 5 20-29             34
 6 30-39             66
 7 40-49             83
 8 50-59            142
 9 60-69            235
10 70-79            220
11 80-89            175
count(Cdiff_patients, SEX)
# A tibble: 2 × 2
  SEX        n
  <fct>  <int>
1 Male     465
2 Female   527
count(Cdiff_patients, LOS) #length of stay
# A tibble: 16 × 2
     LOS     n
   <dbl> <int>
 1     0     7
 2     1    31
 3     2    77
 4     3    82
 5     4    94
 6     5    84
 7     6    79
 8     7    77
 9     8    51
10     9    48
11    10    32
12    11    21
13    12    36
14    13    30
15    14    25
16    15   218
count(Cdiff_patients, LOS_30DAYS) #greater than 30 days
# A tibble: 2 × 2
  LOS_30DAYS     n
  <lgl>      <int>
1 FALSE        903
2 TRUE          89
count(Cdiff_patients, DISCHARGE_MONTH)
# A tibble: 12 × 2
   DISCHARGE_MONTH     n
             <dbl> <int>
 1               1   107
 2               2    82
 3               3    93
 4               4    64
 5               5    70
 6               6    84
 7               7    76
 8               8    73
 9               9    99
10              10    81
11              11    84
12              12    79
count(Cdiff_patients, DISCHARGE_STATUS)
# A tibble: 9 × 2
  DISCHARGE_STATUS                            n
  <fct>                                   <int>
1 Routine to home                           380
2 Left against medical advice                 9
3 Transfer to short term facility            18
4 Transfer to long term facility             76
5 Home health care                          205
6 Hospice care - home or medical facility    30
7 Other                                     193
8 Dead                                       74
9 <NA>                                        7
#visualization
ggplot(Cdiff_patients, aes(x = SEX)) +
  geom_bar() +
  labs(x = "Sex", y = "Count") +
  ggtitle("C. difficile infection by sex") +
  theme_bw()

ggplot(Cdiff_patients, aes(x = age.simplified)) +
  geom_bar() +
  labs(x = "Age", y = "Count") +
  ggtitle("C. difficile infection by age") +
  theme_bw()

ggplot(Cdiff_patients, aes(x = LOS)) +
  geom_bar() +
  labs(x = "Length of stay (up to 14 days)", y = "Count") +
  ggtitle("C. difficile infection by length of stay") +
  theme_bw()

#greater than 15 binned in data

ggplot(Cdiff_patients, aes(x = LOS_30DAYS)) +
  geom_bar() +
  labs(x = "Length of stay greater than 30 days", y = "Count") +
  ggtitle("C. difficile infection by length of stay greater than 30 days") +
  theme_bw()

ggplot(Cdiff_patients, aes(x = DISCHARGE_MONTH)) +
  geom_bar() +
  scale_x_continuous(breaks = 1:12) +
  labs(x = "Month", y = "Count") +
  ggtitle("C. difficile infection by discharge month") +
  theme_bw()

#MRSA
mrsa_patients <- filter(alldiag_2020, DX %in% c("A4102", "A4902", "B9562", "J15212"))
#A4102   Sepsis due to Methicillin resistant Staphylococcus aureus
#A4902   Methicillin resistant Staphylococcus aureus infection, unspecified site
#B9562   Methicillin resistant Staphylococcus aureus infection as the cause of diseases classified elsewhere
#J15212  Pneumonia due to Methicillin resistant Staphylococcus aureus

# no MRSA cases in dataset
#enterococcus
entero_patients <- filter(alldiag_2020, DX %in% c("A4181", "B952"))
#A4181   Sepsis due to Enterococcus
#B952    Enterococcus as the cause of diseases classified elsewhere
#plot number of cases by demographics
#re-level age
entero_patients <- entero_patients |>
  mutate(age.simplified = cut(AGE, 
                              breaks = c(0, 5, 10, 15, 20, seq(30, 100, by = 10)), 
                              right = FALSE, 
                              labels = c("less than 5", "5-9", "10-14", "15-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80-89", "90+")))

count(entero_patients, age.simplified)
# A tibble: 11 × 2
   age.simplified     n
   <fct>          <int>
 1 less than 5       13
 2 5-9                2
 3 10-14              2
 4 15-19              9
 5 20-29             19
 6 30-39             32
 7 40-49             42
 8 50-59             81
 9 60-69            124
10 70-79            115
11 80-89            103
count(entero_patients, SEX)
# A tibble: 2 × 2
  SEX        n
  <fct>  <int>
1 Male     294
2 Female   248
count(entero_patients, LOS) #length of stay
# A tibble: 15 × 2
     LOS     n
   <dbl> <int>
 1     1    12
 2     2    24
 3     3    43
 4     4    54
 5     5    42
 6     6    40
 7     7    39
 8     8    34
 9     9    22
10    10    24
11    11    20
12    12    20
13    13    12
14    14    15
15    15   141
count(entero_patients, LOS_30DAYS) #greater than 30 days
# A tibble: 2 × 2
  LOS_30DAYS     n
  <lgl>      <int>
1 FALSE        491
2 TRUE          51
count(entero_patients, DISCHARGE_MONTH)
# A tibble: 12 × 2
   DISCHARGE_MONTH     n
             <dbl> <int>
 1               1    52
 2               2    41
 3               3    44
 4               4    34
 5               5    51
 6               6    39
 7               7    56
 8               8    31
 9               9    47
10              10    48
11              11    46
12              12    53
count(entero_patients, DISCHARGE_STATUS)
# A tibble: 8 × 2
  DISCHARGE_STATUS                            n
  <fct>                                   <int>
1 Routine to home                           160
2 Left against medical advice                 4
3 Transfer to short term facility             6
4 Transfer to long term facility             46
5 Home health care                          161
6 Hospice care - home or medical facility    27
7 Other                                     116
8 Dead                                       22
#visualization
ggplot(entero_patients, aes(x = SEX)) +
  geom_bar() +
  labs(x = "Sex", y = "Count") +
  ggtitle("Enterococcus infections by sex") +
  theme_bw()

ggplot(entero_patients, aes(x = age.simplified)) +
  geom_bar() +
  labs(x = "Age", y = "Count") +
  ggtitle("Enterococcus infections by age") +
  theme_bw()

ggplot(entero_patients, aes(x = LOS)) +
  geom_bar() +
  labs(x = "Length of stay (up to 14 days)", y = "Count") +
  ggtitle("Enterococcus infections by length of stay") +
  theme_bw()

#greater than 15 binned in data

ggplot(entero_patients, aes(x = LOS_30DAYS)) +
  geom_bar() +
  labs(x = "Length of stay greater than 30 days", y = "Count") +
  ggtitle("Enterococcus infections by length of stay greater than 30 days") +
  theme_bw()

ggplot(entero_patients, aes(x = DISCHARGE_MONTH)) +
  geom_bar() +
  scale_x_continuous(breaks = 1:12) +
  labs(x = "Month", y = "Count") +
  ggtitle("Enterococcus infections by discharge month") +
  theme_bw()

#infections from catheter
infcatheter_patients <- filter(alldiag_2020, DX %in% c("T80211A", "T80211D ", "T80211S", "T80212A", "T80212D", "T80212S", "T80218A", "T80218D", "T80218S", "T80219A", "T80219D", "T80219S"))
#T80211A Bloodstream infection due to central venous catheter, initial encounter
#T80211D Bloodstream infection due to central venous catheter, subsequent encounter
#T80211S Bloodstream infection due to central venous catheter, sequela
#T80212A Local infection due to central venous catheter, initial encounter
#T80212D Local infection due to central venous catheter, subsequent encounter
#T80212S Local infection due to central venous catheter, sequela
#T80218A Other infection due to central venous catheter, initial encounter
#T80218D Other infection due to central venous catheter, subsequent encounter
#T80218S Other infection due to central venous catheter, sequela
#T80219A Unspecified infection due to central venous catheter, initial encounter
#T80219D Unspecified infection due to central venous catheter, subsequent encounter
#T80219S Unspecified infection due to central venous catheter, sequela

# no catheter-associated infections in dataset
# ventilator-associated pneumonia
ventpenumo_patients <- filter(alldiag_2020, DX =="J95851")
#J95851  Ventilator associated pneumonia

# no cases in dataset
# surgical site infections
surginf_patients <- filter(alldiag_2020, DX %in% c("T8141XA", "T8141XD", "T8141XS", "T8142XA", "T8142XD", "T8142XS", "T8143XA", "T8143XD", "T8143XS", "O8600", "O8601", "O8602", "O8603", "O8604", "08609"))

#T8141XA Infection following a procedure, superficial incisional surgical site, initial encounter
#T8141XD Infection following a procedure, superficial incisional surgical site, subsequent encounter
#T8141XS Infection following a procedure, superficial incisional surgical site, sequela
#T8142XA Infection following a procedure, deep incisional surgical site, initial encounter
#T8142XD Infection following a procedure, deep incisional surgical site, subsequent encounter
#T8142XS Infection following a procedure, deep incisional surgical site, sequela
#T8143XA Infection following a procedure, organ and space surgical site, initial encounter
#T8143XD Infection following a procedure, organ and space surgical site, subsequent encounter
#T8143XS Infection following a procedure, organ and space surgical site, sequela
#O8600   Infection of obstetric surgical wound, unspecified
#O8601   Infection of obstetric surgical wound, superficial incisional site
#O8602   Infection of obstetric surgical wound, deep incisional site
#O8603   Infection of obstetric surgical wound, organ and space site
#O8604   Sepsis following an obstetrical procedure
#O8609   Infection of obstetric surgical wound, other surgical site

# no cases in dataset
#plot month data together 
covid_month <- ggplot(covid_patients, aes(x = DISCHARGE_MONTH)) +
  geom_bar() +
  scale_x_continuous(breaks = 1:12) +
  labs(x = "Month", y = "Count") +
  theme_bw()

#adjust margins for cowplot
covid_month <- covid_month + theme(plot.margin = margin(t = 20, r = 10, b = 10, l = 10))

bactpneumo_month <- ggplot(bactpneumo_patients, aes(x = DISCHARGE_MONTH)) +
  geom_bar() +
  scale_x_continuous(breaks = 1:12) +
  labs(x = "Month", y = "Count") +
  theme_bw()

#adjust margins for cowplot
bactpneumo_month <- bactpneumo_month + theme(plot.margin = margin(t = 20, r = 10, b = 10, l = 10))

cdiff_month <- ggplot(Cdiff_patients, aes(x = DISCHARGE_MONTH)) +
  geom_bar() +
  scale_x_continuous(breaks = 1:12) +
  labs(x = "Month", y = "Count") +
  theme_bw()

#adjust margins for cowplot
cdiff_month <- cdiff_month + theme(plot.margin = margin(t = 20, r = 10, b = 10, l = 10))

entero_month <- ggplot(entero_patients, aes(x = DISCHARGE_MONTH)) +
  geom_bar() +
  scale_x_continuous(breaks = 1:12) +
  labs(x = "Month", y = "Count") +
  theme_bw()

entero_month <- entero_month + theme(plot.margin = margin(t = 20, r = 10, b = 10, l = 10))

plot_grid(covid_month, bactpneumo_month, cdiff_month, entero_month, labels = c('COVID-19', 'Bacterial pneumonia', 'C. difficile', 'Enterococcus'), label_size = 12, label_x = -0.05, label_y = 1)

  • Bacterial pneumonia follows a similar pattern to COVID-19 as expected, but C. difficile and Enterococcus infections appears to be pretty consistent throughout the year.

  • There were no cases of infections associated with surgeries, infections from catheter use, methicillin-resistant Staphylococcus aureus infections, or ventilator-associated pneumonia in the dataset.

# proportions by discharge month and age

ggplot(covid_patients, aes(x = DISCHARGE_MONTH, fill = age.simplified)) +
  geom_bar(position = "fill") + 
  scale_fill_brewer(palette = "Spectral") +
  scale_x_continuous(breaks = 1:12) +
  labs(x = "Discharge Month", y = "Proportion", fill = "Age") +
  ggtitle("COVID-19 cases") +
  theme_bw()

ggplot(bactpneumo_patients, aes(x = DISCHARGE_MONTH, fill = age.simplified)) +
  geom_bar(position = "fill") + 
  scale_fill_brewer(palette = "Spectral") +
  scale_x_continuous(breaks = 1:12) +
  labs(x = "Discharge Month", y = "Proportion", fill = "Age") +
  ggtitle("Bacterial pneumonia cases") +
  theme_bw()

ggplot(Cdiff_patients, aes(x = DISCHARGE_MONTH, fill = age.simplified)) +
  geom_bar(position = "fill") + 
  scale_fill_brewer(palette = "Spectral") +
  scale_x_continuous(breaks = 1:12) +
  labs(x = "Discharge Month", y = "Proportion", fill = "Age") +
  ggtitle("C. difficile infections") +
  theme_bw()

ggplot(entero_patients, aes(x = DISCHARGE_MONTH, fill = age.simplified)) +
  geom_bar(position = "fill") + 
  scale_fill_brewer(palette = "Spectral") +
  scale_x_continuous(breaks = 1:12) +
  labs(x = "Discharge Month", y = "Proportion", fill = "Age") +
  ggtitle("Enterococcus infections") +
  theme_bw()

# Add descriptions to diagnosis
# Read the lines from the file
lines <- readLines("./2021-code-descriptions-tabular-order/icd10cm_codes_2021.txt")

# Split each line by 2 or more spaces
split_lines <- strsplit(lines, "\\s{2,}")

# Find the maximum number of columns
max_length <- max(sapply(split_lines, length))

# Pad shorter rows with NA
padded_lines <- lapply(split_lines, function(x) {
  length(x) <- max_length  # Extend the length
  return(x)  # Returns the padded line
})

# Convert the list to a data frame
icd_code <- do.call(rbind, lapply(padded_lines, function(x) as.data.frame(t(x), stringsAsFactors = FALSE)))

# Optionally, set column names if needed
colnames(icd_code) <- c("DX", "Description")

alldiag_desc <- left_join(alldiag_2020, icd_code)
Joining with `by = join_by(DX)`
# diagnosis arranged by number of cases
alldiag_desc |>
  group_by(Description) |>
  summarize(count = n()) |>
  arrange(desc(count))
# A tibble: 3,814 × 2
   Description                                              count
   <chr>                                                    <int>
 1 <NA>                                                   3217815
 2 Essential (primary) hypertension                         30922
 3 Hyperlipidemia, unspecified                              27947
 4 Acute kidney failure, unspecified                        18190
 5 Gastro-esophageal reflux disease without esophagitis     17444
 6 Single live birth                                        12200
 7 Encounter for immunization                               11723
 8 Major depressive disorder, single episode, unspecified   11503
 9 Anxiety disorder, unspecified                            11406
10 Hypo-osmolality and hyponatremia                         10649
# ℹ 3,804 more rows
# determine whether discharge month is associated with COVID 

# filter DISCHARGE_MONTH and DX columns
dx_month <- alldiag_desc |>
  select(c(DISCHARGE_MONTH, DX))

# Remove NA in DX
dx_month_narm <- dx_month[!is.na(dx_month$DX), ]

# Remove NA in Discharge Month
dx_month_narm <- dx_month_narm[!is.na(dx_month_narm$DISCHARGE_MONTH), ]

# create new column with COVID infection as factor
covid_binary_month <- dx_month_narm |>
  mutate(COVID = ifelse(DX == "U071", 1, 0))

# Fit the logistic model
covid.month.fit <- glm(COVID ~ DISCHARGE_MONTH, data = covid_binary_month, family = binomial)

summary(covid.month.fit)

Call:
glm(formula = COVID ~ DISCHARGE_MONTH, family = binomial, data = covid_binary_month)

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -6.500450   0.034265 -189.71   <2e-16 ***
DISCHARGE_MONTH  0.142463   0.003886   36.66   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 83171  on 1498051  degrees of freedom
Residual deviance: 81710  on 1498050  degrees of freedom
AIC: 81714

Number of Fisher Scoring iterations: 8
# The positive coefficient for discharge_month (0.142) and P-value of less than 2e-16 indicates a significant positive association between discharge month and covid diagnosis (as expected). 
# determine whether discharge month is associated with bacterial pneumonia  

bactpneumo_code <- c("J13", "J14", "J150", "J151", "J1520", "J15211", "J1529", "J153", "J154", "J155", "J1561", "J1569", "J157", "J158", "J159", "J160")

# create new column with COVID infection as factor
bactpneumo_binary_month <- dx_month_narm |>
  mutate(bactpneumo = ifelse(DX %in% bactpneumo_code, 1, 0))

# Fit the linear model
bactpneumo.month.fit <- glm(bactpneumo ~ DISCHARGE_MONTH, data = bactpneumo_binary_month, family = binomial)

summary(bactpneumo.month.fit)

Call:
glm(formula = bactpneumo ~ DISCHARGE_MONTH, family = binomial, 
    data = bactpneumo_binary_month)

Coefficients:
                 Estimate Std. Error  z value Pr(>|z|)    
(Intercept)     -6.582895   0.049489 -133.017  < 2e-16 ***
DISCHARGE_MONTH -0.033551   0.006973   -4.811  1.5e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 26074  on 1498051  degrees of freedom
Residual deviance: 26051  on 1498050  degrees of freedom
AIC: 26055

Number of Fisher Scoring iterations: 9
# The negative coefficient for discharge_month (-0.03) and P-value of less than 0.001 indicates a very significant negative association between discharge month and bacterial pneumonia. 
# determine whether primary (DX1) COVID diagnosis is associated with other hospital-associated infections

alldiag_covid_binary <- var_2020 |>
  mutate(COVID = ifelse(rowSums(select(var_2020, 21:50) == "U071", na.rm = TRUE) > 0, 1, 0))
# determine whether COVID is associated with bacterial pneumonia

bactpneumo_code <- c("J13", "J14", "J150", "J151", "J1520", "J15211", "J1529", "J153", "J154", "J155", "J1561", "J1569", "J157", "J158", "J159", "J160")

alldiag_bactpneumo_binary <- alldiag_covid_binary |>
  mutate(bactpneumo = ifelse(rowSums(select(alldiag_covid_binary, 21:50) == bactpneumo_code, na.rm = TRUE) > 0, 1, 0))

alldiag_bactpneumo_factor <- alldiag_bactpneumo_binary %>%
  mutate(COVID = factor(COVID, levels = c(0, 1), labels = c("Negative", "Positive")),
         bactpneumo = factor(bactpneumo, levels = c(0,1), labels = c("Negative", "Positive")))

# Create the plot with labeled factors
ggplot(alldiag_bactpneumo_factor, aes(x = bactpneumo, fill = COVID)) +
  geom_bar(position = "fill") +
  scale_fill_manual(values = c("Negative" = "#aaaaaa", "Positive" = "#b20000")) +
  theme_bw() +
  labs(x = "Bacterial Pneumonia", 
       y = "Proportion", 
       fill = "COVID Status")

# Fit the linear model
covid.bactpneumo.fit <- glm(bactpneumo ~ COVID, data = alldiag_bactpneumo_binary, family = binomial)

summary(covid.bactpneumo.fit)

Call:
glm(formula = bactpneumo ~ COVID, family = binomial, data = alldiag_bactpneumo_binary)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -7.2344     0.1049  -68.99  < 2e-16 ***
COVID         1.6850     0.2262    7.45 9.33e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1865.7  on 132693  degrees of freedom
Residual deviance: 1826.3  on 132692  degrees of freedom
AIC: 1830.3

Number of Fisher Scoring iterations: 10
# a positive coefficient (1.69) and a p-value less than 0.001 indicates that there is a very strong positive assocition between bacterial pneumonia and COVID dianosis. 
# determine whether COVID is associated with C. difficile colitis

alldiag_cdiff_binary <- alldiag_covid_binary |>
  mutate(Cdiff = ifelse(rowSums(select(alldiag_covid_binary, 21:50) == "A047", na.rm = TRUE) > 0, 1, 0))

alldiag_cdiff_factor <- alldiag_cdiff_binary %>%
  mutate(COVID = factor(COVID, levels = c(0, 1), labels = c("Negative", "Positive")),
         Cdiff = factor(Cdiff, levels = c(0,1), labels = c("Negative", "Positive")))

# Create the plot with labeled factors
ggplot(alldiag_cdiff_factor, aes(x = Cdiff, fill = COVID)) +
  geom_bar(position = "fill") +
  scale_fill_manual(values = c("Negative" = "#aaaaaa", "Positive" = "#b20000")) +
  theme_bw() +
  labs(x = "C. difficile infection", 
       y = "Proportion", 
       fill = "COVID Status")

# Fit the linear model
covid.cdiff.fit <- glm(Cdiff ~ COVID, data = alldiag_cdiff_binary, family = binomial)

summary(covid.cdiff.fit)

Call:
glm(formula = Cdiff ~ COVID, family = binomial, data = alldiag_cdiff_binary)

Coefficients:
            Estimate Std. Error  z value Pr(>|z|)    
(Intercept) -4.89046    0.03270 -149.548   <2e-16 ***
COVID        0.03812    0.14568    0.262    0.794    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 11690  on 132693  degrees of freedom
Residual deviance: 11690  on 132692  degrees of freedom
AIC: 11694

Number of Fisher Scoring iterations: 7
# no association between C. diff colitis and COVID-19 diagnosis 
# determine whether COVID diagnosis is associated with Enterococcus infection

entero_code <- c("A4181", "B952")

alldiag_entero_binary <- alldiag_covid_binary |>
  mutate(Entero = ifelse(rowSums(select(alldiag_covid_binary, 21:50) == entero_code, na.rm = TRUE) > 0, 1, 0))

alldiag_entero_factor <- alldiag_entero_binary %>%
  mutate(COVID = factor(COVID, levels = c(0, 1), labels = c("Negative", "Positive")),
         Entero = factor(Entero, levels = c(0,1), labels = c("Negative", "Positive")))

# Create the plot with labeled factors
ggplot(alldiag_entero_factor, aes(x = Entero, fill = COVID)) +
  geom_bar(position = "fill") +
  scale_fill_manual(values = c("Negative" = "#aaaaaa", "Positive" = "#b20000")) +
  theme_bw() +
  labs(x = "Enterococcus infection", 
       y = "Proportion", 
       fill = "COVID Status")

# Fit the linear model
covid.entero.fit <- glm(Entero ~ COVID, data = alldiag_entero_binary, family = binomial)

summary(covid.entero.fit)

Call:
glm(formula = Entero ~ COVID, family = binomial, data = alldiag_entero_binary)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -6.21056    0.06293 -98.686   <2e-16 ***
COVID        0.21349    0.25810   0.827    0.408    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 3873.6  on 132693  degrees of freedom
Residual deviance: 3873.0  on 132692  degrees of freedom
AIC: 3877

Number of Fisher Scoring iterations: 9
# no association between Enterococus infection and COVID-19 diagnosis 
# Read in NHCS 2020 emergency department R Dataset
url2 <- "https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NHCS/2020/R/nhcs2020ed_r.rds"
nhcs2020ed <- read_rds(url2)
# emergency department dataset
#pull out variables list
var_2020_ed <- nhcs2020ed$variables

#select useful columns
var_2020_ed_select <- select(var_2020_ed, 1:37)

#variable names 
varnames_2020_ed <- colnames(var_2020_ed_select)

#pull out primary diagnosis (DX1)
diag1_2020_ed <- var_2020_ed_select$DX1

#determine the number of primary C. diff infections (ICD-10 diagnosis code: A047)
primaryCdiff_2020_ed <- sum(diag1_2020_ed == "A047", na.rm = TRUE)
primaryCdiff_2020_ed
[1] 151
#there were 151 cases of primary C.diff infection into the emergency department 
#combine all diagnosis
alldiag_2020_ed <- var_2020_ed_select |>
  pivot_longer(
    cols = c(DX1:DX30), 
    values_to = "DX"
  )

#all COVID infections
all_covid_ed <- sum(alldiag_2020_ed =="U071", na.rm = TRUE)
all_covid_ed
[1] 10140
#10,140 total COVID-19 diagnosis in emergency department 

#all C.diff infection
allCdiff_ed <- sum(alldiag_2020_ed == "A047", na.rm = TRUE)
allCdiff_ed
[1] 492
#492 total C.diff infection
#subset COVID-19 patients
covid_patients_ed <- filter(alldiag_2020_ed, DX =="U071")
covid_patients_ed <- covid_patients_ed |>
  filter(!is.na(AGE)) |>
  filter(!is.na(SEX))
#re-level age
covid_patients_ed <- covid_patients_ed |>
  mutate(age.simplified = cut(AGE, 
                              breaks = c(0, 5, 10, 15, 20, seq(30, 100, by = 10)), 
                              right = FALSE, 
                              labels = c("less than 5", "5-9", "10-14", "15-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80-89", "90+")))

count(covid_patients_ed, age.simplified)
# A tibble: 11 × 2
   age.simplified     n
   <fct>          <int>
 1 less than 5      208
 2 5-9               88
 3 10-14            115
 4 15-19            286
 5 20-29           1124
 6 30-39           1399
 7 40-49           1436
 8 50-59           1658
 9 60-69           1577
10 70-79           1199
11 80-89           1032
count(covid_patients_ed, AGE)
# A tibble: 87 × 2
     AGE     n
   <dbl> <int>
 1     0    86
 2     1    52
 3     2    32
 4     3    19
 5     4    19
 6     5    17
 7     6    14
 8     7    15
 9     8    20
10     9    22
# ℹ 77 more rows
count(covid_patients_ed, SEX)
# A tibble: 2 × 2
  SEX        n
  <fct>  <int>
1 Male    4893
2 Female  5229
count(covid_patients_ed, DISCHARGE_MONTH)
# A tibble: 12 × 2
   DISCHARGE_MONTH     n
             <dbl> <int>
 1               1    17
 2               2    19
 3               3    26
 4               4  1344
 5               5   834
 6               6   613
 7               7   975
 8               8   627
 9               9   427
10              10   803
11              11  1839
12              12  2598
count(covid_patients_ed, DISCHARGE_STATUS)
# A tibble: 9 × 2
  DISCHARGE_STATUS                            n
  <fct>                                   <int>
1 Routine to home                          7395
2 Left against medical advice                80
3 Transfer to short term facility           145
4 Transfer to long term facility            114
5 Home health care                          347
6 Hospice care - home or medical facility   121
7 Other                                     995
8 Dead                                      440
9 <NA>                                      485
#visualization
ggplot(covid_patients_ed, aes(x = SEX)) +
  geom_bar() +
  labs(x = "Sex", y = "Count") +
  ggtitle("COVID-19 cases by sex") +
  theme_bw()

ggplot(covid_patients_ed, aes(x = age.simplified)) +
  geom_bar() +
  labs(x = "Age", y = "Count") +
  ggtitle("COVID-19 cases by age") +
  theme_bw()

ggplot(covid_patients_ed, aes(x = DISCHARGE_MONTH)) +
  geom_bar() +
  scale_x_continuous(breaks = 1:12) +
  labs(x = "Month", y = "Count") +
  ggtitle("COVID-19 cases by discharge month") +
  theme_bw()

#subset bacterial pneumonia patients
bactpneumo_patients_ed <- filter(alldiag_2020_ed, DX %in% c("J13", "J14", "J150", "J151", "J1520", "J15211", "J1529", "J153", "J154", "J155", "J1561", "J1569", "J157", "J158", "J159", "J160"))
#J13     Pneumonia due to Streptococcus pneumoniae
#J14     Pneumonia due to Hemophilus influenzae
#J150    Pneumonia due to Klebsiella pneumoniae
#J151    Pneumonia due to Pseudomonas
#J1520   Pneumonia due to staphylococcus, unspecified
#J15211  Pneumonia due to Methicillin susceptible Staphylococcus aureus
#J15212  Pneumonia due to Methicillin resistant Staphylococcus aureus
#J1529   Pneumonia due to other staphylococcus
#J153    Pneumonia due to streptococcus, group B
#J154    Pneumonia due to other streptococci
#J155    Pneumonia due to Escherichia coli
#J1561   Pneumonia due to Acinetobacter baumannii
#J1569   Pneumonia due to other Gram-negative bacteria
#J157    Pneumonia due to Mycoplasma pneumoniae
#J158    Pneumonia due to other specified bacteria
#J159    Unspecified bacterial pneumonia
#J160    Chlamydial pneumonia
#re-level age
bactpneumo_patients_ed <- bactpneumo_patients_ed |>
  mutate(age.simplified = cut(AGE, 
                              breaks = c(0, 5, 10, 15, 20, seq(30, 100, by = 10)), 
                              right = FALSE, 
                              labels = c("less than 5", "5-9", "10-14", "15-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80-89", "90+")))

count(bactpneumo_patients_ed, age.simplified)
# A tibble: 11 × 2
   age.simplified     n
   <fct>          <int>
 1 less than 5       38
 2 5-9               11
 3 10-14              7
 4 15-19              6
 5 20-29             17
 6 30-39             28
 7 40-49             54
 8 50-59            101
 9 60-69            149
10 70-79            137
11 80-89            119
count(bactpneumo_patients_ed, SEX)
# A tibble: 2 × 2
  SEX        n
  <fct>  <int>
1 Male     380
2 Female   287
count(bactpneumo_patients_ed, DISCHARGE_MONTH)
# A tibble: 12 × 2
   DISCHARGE_MONTH     n
             <dbl> <int>
 1               1    78
 2               2    75
 3               3    72
 4               4    65
 5               5    50
 6               6    47
 7               7    44
 8               8    31
 9               9    44
10              10    37
11              11    51
12              12    73
count(bactpneumo_patients_ed, DISCHARGE_STATUS)
# A tibble: 9 × 2
  DISCHARGE_STATUS                            n
  <fct>                                   <int>
1 Routine to home                           272
2 Left against medical advice                10
3 Transfer to short term facility             9
4 Transfer to long term facility             48
5 Home health care                           96
6 Hospice care - home or medical facility    31
7 Other                                      87
8 Dead                                       87
9 <NA>                                       27
#visualization
ggplot(bactpneumo_patients_ed, aes(x = SEX)) +
  geom_bar() +
  labs(x = "Sex", y = "Count") +
  ggtitle("Bacterial penumonia cases by sex") +
  theme_bw()

ggplot(bactpneumo_patients_ed, aes(x = age.simplified)) +
  geom_bar() +
  labs(x = "Age", y = "Count") +
    ggtitle("Bacterial penumonia cases by age") +
  theme_bw()

ggplot(bactpneumo_patients_ed, aes(x = DISCHARGE_MONTH)) +
  geom_bar() +
  scale_x_continuous(breaks = 1:12) +
  labs(x = "Month", y = "Count") +
    ggtitle("Bacterial penumonia cases by discharge month") +
  theme_bw()

#subset C.diff patients
Cdiff_patients_ed <- filter(alldiag_2020_ed, DX == "A047")
#plot number of cases by demographics
#re-level age
Cdiff_patients_ed <- Cdiff_patients_ed |>
  mutate(age.simplified = cut(AGE, 
                              breaks = c(0, 5, 10, 15, 20, seq(30, 100, by = 10)), 
                              right = FALSE, 
                              labels = c("less than 5", "5-9", "10-14", "15-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80-89", "90+")))

count(Cdiff_patients_ed, age.simplified)
# A tibble: 11 × 2
   age.simplified     n
   <fct>          <int>
 1 less than 5        3
 2 5-9                4
 3 10-14              4
 4 15-19              9
 5 20-29             28
 6 30-39             25
 7 40-49             26
 8 50-59             55
 9 60-69            104
10 70-79            131
11 80-89            103
count(Cdiff_patients_ed, SEX)
# A tibble: 2 × 2
  SEX        n
  <fct>  <int>
1 Male     211
2 Female   281
count(Cdiff_patients_ed, DISCHARGE_MONTH)
# A tibble: 12 × 2
   DISCHARGE_MONTH     n
             <dbl> <int>
 1               1    52
 2               2    40
 3               3    39
 4               4    28
 5               5    33
 6               6    29
 7               7    33
 8               8    37
 9               9    53
10              10    44
11              11    47
12              12    57
count(Cdiff_patients_ed, DISCHARGE_STATUS)
# A tibble: 9 × 2
  DISCHARGE_STATUS                            n
  <fct>                                   <int>
1 Routine to home                           204
2 Left against medical advice                 7
3 Transfer to short term facility             4
4 Transfer to long term facility             17
5 Home health care                           71
6 Hospice care - home or medical facility    18
7 Other                                     115
8 Dead                                       24
9 <NA>                                       32
#visualization
ggplot(Cdiff_patients_ed, aes(x = SEX)) +
  geom_bar() +
  labs(x = "Sex", y = "Count") +
  ggtitle("C. difficile infection by sex") +
  theme_bw()

ggplot(Cdiff_patients_ed, aes(x = age.simplified)) +
  geom_bar() +
  labs(x = "Age", y = "Count") +
  ggtitle("C. difficile infection by age") +
  theme_bw()

ggplot(Cdiff_patients_ed, aes(x = DISCHARGE_MONTH)) +
  geom_bar() +
  scale_x_continuous(breaks = 1:12) +
  labs(x = "Month", y = "Count") +
  ggtitle("C. difficile infection by discharge month") +
  theme_bw()

4 Results

Describe your results and include relevant tables, plots, and code/comments used to obtain them. You may refer to the Section 3 as needed. End with a brief conclusion of your findings related to the question you set out to address. You can include references if you’d like, but this is not required.

Conclusions

5 Conclusion

This the conclusion. The Section 4 can be invoked here.